ORs in the Trigeminal Ganglia

Nguyen et al. 2017 conducted a single-cell genomics study investigating the trigeminal ganglia. I was curious - can we learn anything about DOR expressing cells from this dataset? For starters, let’s take a look at the raw data:

These data are comprised of gene names, the cells in which those genes were find (identified by a DNA barcode), and the number of times that gene was detected in a given cell (its count). Given that, we can check for the frequency of opioid receptors in the dataset:

Unfortunately, opioid receptors make up a tiny portion of the dataset - no more than 4% of cells for any receptor, with Oprd1 the least frequently expressed. However, we can gain a lot of information about what genes are expressed in these specific cells. For example, we can look for genes that are only expressed in cells with opioid receptors:

In total, there are 379 genes that are only expressed in opioid receptor containing cells.

Nick Ryba (the study’s last author), notes the following caveat about these low-expressing cells in their paper:

From our experience just about any gene that is well represented in the single cell data is expressed at high level and can be detected in appropriate cell populations using ISH; the exceptions are genes that are rapidly upregulated by stress/injury (dissociation and isolation of neurons for sequencing). So Fos (not detected in the ganglion) or Gal (low level expression) are high in the sc-data. Obviously genes that seem to dropout like Oprm1 and Oprd1 might just be expressed at a moderate level: basically I would estimate that a normal neuron has >500,000 transcripts and on average we are capturing about 1,500 in the data (i.e. only ~0.2% of total transcripts are sequenced). In addition, Dropseq only sequences a region of the 3’-end (everything barcoded is tethered to the polyA tail); if by chance a gene has a 3’-end that lacks complexity/individuality it will also be under-represented (the sequence needs to match one genomic location and only one gene model from the database); similarly if a gene has a short polyA tail, it might be under-sampled because it fails to be captured. In addition, some transcripts have been suggested to be rapidly translocated to axon termini; those might also be missing. Finally, the data are by no means unbiased; large diameter neurons are under-sampled and by contrast nociceptors over-represented. My guess is that with opioid receptors, it’s a bit of several of these that account for its sparseness; I doubt they are very highly expressed but they might not be as low as the data suggest.

So with that in mind, what can we learn about cells that express DOR?

DOR-associated genes in the TG

First, let’s see which genes are expressed exclusively in DOR-expressing cells comapred to the rest of the sample:

None of these genes have very many reads in DOR cells. Maybe there are genes highly expressed in DOR-cells, even if they’re not unique to DOR cells? Here we look at the top hits for genes expressed in DOR cells:

So, a lot of neuronal genes. These probably won’t give specific labels. It might also be interesting to look at genes that are NEVER expressed in DOR-expressing cells:

This is great, because we know that VIP is a useful neuronal marker, AND it’s expressed at a high frequency in the dataset. VIP should mark non-DOR cells exclusively. But we still don’t have a DOR-specific marker.

In the end, what we’re most curious about is genes that are enriched in DOR-expressing cells over the global cell population. To get this, we use a normalized likelihood ratio. The likelihood ratio is calculated as: \(\frac{gene_{DOR}}{reads_{DOR}} /{\frac{gene_{total}}{reads_{total}}}\). For our total variable, we can select either from the entire dataset, or exclusively cells not expressing DOR. We also want to use a logarithmic transformation, as these data will be dramatically left-clustered. Below, the log-likelihood ratio for enrichment in DOR cells vs. the total dataset is listed as likelihood_total, and enrichment in DOR cells vs. non-DOR cells is listed as likelihood_xdor. I’ve filtered out genes only expressed in DOR cells from this list as they are obviously the most enriched genes.

In order to determine what magnitude of enrichment we should focus on when looking for interesting genes, we can look at these likelihood ratios presented as a histogram. These distributions appear roughly normally distributed:
The distribution of both likelihood metrics plotted as a histogram. Both populations have roughly normal distribution centered around 0.

The distribution of both likelihood metrics plotted as a histogram. Both populations have roughly normal distribution centered around 0.

Selecting a cutoff for where genes become interesting is arbitrary, but to inform ourselves as to what might be meaningful here, we can look at distribution summary statistics. The mean and standard deviation give us an idea of where we should be looking for interesting genes:

With these in mind, let’s glimpse a few DOR genes we know are functionally associated with DOR:

All of these genes are within one standard deviation of the distribution mean, although the trends are as expected for several of them.

You can search the entire list of enriched genes below:

DOR in TG Neuronal Subtypes

Nguyen et al. 2017 show that of the above cells, only about 3,500 cells are ‘neurons’. This cutoff is made based predominantly on the expression of two neuronal genes: Tubb3 and Scn9a. The authors then clustered these data based on gene expression and found several distinct clusters, which could also be thought of as subtypes of cells. Dr. Ryba provided us with the analyzed neuronal data. We can visualize the clusters they detected:

I’ve labeled these clusters based on the broad labels used in the original paper - these are all gene names that the clusters express highly. If all were perfect, we would see that all of our DOR-expressing cells cluster within one specific cluster. We can highlight cells expresing our gene of interest in that other plot. Here, I’ve used both DOR and MOR so that their distributions can be compared:

Delta-expressing cells are in blue, Mu-expressing cells in red, everything else in gray. Note that you can hover over every element above to see which cell and cluster is belongs to. Clearly our cells of interest don’t cluster very nicely in this plot. This makes sense, given that our receptors are not well represented in the dataset - the clustering above is dominated by genes with much higher expression counts. What if we look at relative expression of our opioid receptors across the clusters? This is a bit more enlightening! Based on the descriptions provided in the published data, MOR tends to be expressed in neurons connected to cold-sensation, discriminative touch, nociception, heat sensation, and itch (Trpm8, S100b/Nefh, Calca/Tac1/Trpv1, Trpv1, and itch markers, respectively), while DOR seems enriched in affective C fibers (slow mechanosensation/gentle touch), discriminative touch, itch, and noxious mechanosensation (Cd34, S100b/Nefh, itch markers, and Mrgprd, respectively). There appears to be functional specifalization of these receptors in sensation.

How do our DOR-enriched genes fair? Below is a heatmap of gene expression for the top 150 genes in our DOR-enriched datase. DOR is on the top so you can compare cluster expression: Note that many of them are not displayed - it appears that some of the DOR we found in the initial dataset is in cells that were discarded after filtering for neurons. In fact, of the 45 cells in the full dataset that contain DOR, 11 are not neurons.

At least by my eye, the genes that jump out the most are Thrsp (a thyroid hormone responsive gene taht might regulate transcription factors) and Hs3st2 (a heparan sulfate transferase), both present in high(er) amounts in a cluster of discriminative touch neurons. Given the lack of conclusively identifying genes, it might be useful to look at markers that are prominent in cluster S100b/Nefh-2 and in Mrgprd.

S100b/Nefh1 Cluster

You can see the top 100 markers for the S100b/Nefh-2 cluster:

And we can appreciate that the likelihood distribution of these genes is right shifted in DOR-expressing cells. Given these findings, it would suggest that PCP4 (a regulator of Calmodulin) might be a useful marker for a sub-population of DOR expressing cells.

Mrgprd Cluster

You can see the top 100 markers for the Mrgprd cluster:

The likelihood distribution of these genes is even further right shifted in DOR-expressing cells compared to the S100b/Nefh cluster. Given these findings, it would suggest that Mrgprd (a regulator of Calmodulin) might be a useful marker for a sub-population of DOR expressing cells.

Non-neuronal DOR

Given that almost 20% of our DOR cells were filtered out using the neuronal filter on the above dataset, this might represent a third prominent population of DOR. Again, we calculate the log likelihood of genes expressed in just those cells vs the rest of the dataset:

Unfortunately, many of these look to be neuronal genes so it’s possible this population was erroneously discarded. Not much we can do with these cells.

Conclusions

The above data together suggest that DOR could be theoretically positively labeled by Mrgprd and PCP4, and should not colocalize with VIP in the trigeminal ganglia. Furthermore, the above results provide a good list to crosscheck screening results against, as genes enriched in DOR-expressing cells might represent more interesting hits from the HEK screen.

All this is of course with two major caveats:

  • DOR is underrepresented in the dataset. This is possibly artifiactual and possibly real. It might be useful to run an in situ to determine the frequency of endogenous DOR in our cultures.
  • Of the two classes of DOR-containing cells we can find, there is yet a 3rd class whose nature we are not able to presently identify.